Stochastic Approximation for Risk-Aware Markov Decision Processes

نویسندگان

چکیده

We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our has two loops. The inner loop computes the risk by solving saddle-point problem. outer performs $Q-$ learning compute an optimal policy. Several widely investigated measures (e.g., conditional value-at-risk, optimized certainty equivalent, and absolute semideviation) are covered our algorithm. Almost sure convergence rate of established. For error tolerance $\epsilon >0$ for $Q$ -value estimation gap $k\in (1/2,\,1]$ , overall is $\Omega ((\ln (1/\delta \epsilon)/\epsilon ^{2})^{1/k}+(\ln (1/\epsilon))^{1/(1-k)})$ with probability at least $1-\delta$ .

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Value Iteration for Risk-aware Markov Decision Processes

We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling the risk measure, can be solved using dynamic programming for small to medium sized problems. However, due to the “curse of dimensionality”, MDPs that model real-life problem...

متن کامل

A Convex Analytic Approach to Risk-Aware Markov Decision Processes

Abstract. In classical Markov decision process (MDP) theory, we search for a policy that say, minimizes the expected infinite horizon discounted cost. Expectation is of course, a risk neutral measure, which does not su ce in many applications, particularly in finance. We replace the expectation with a general risk functional, and call such models risk-aware MDP models. We consider minimization ...

متن کامل

Central-limit approach to risk-aware Markov decision processes

Whereas classical Markov decision processes maximize the expected reward, we consider minimizing the risk. We propose to evaluate the risk associated to a given policy over a longenough time horizon with the help of a central limit theorem. The proposed approach works whether the transition probabilities are known or not. We also provide a gradient-based policy improvement algorithm that conver...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Markov Decision Processes: Discrete Stochastic Dynamic Programming

The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2021

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2020.2989702